• A new research paper details the mapping of AI model Claude 3 Sonnet's inner workings, revealing "features" activated by concepts like the Golden Gate Bridge. By adjusting these features' strengths, researchers can direct Claude's responses to incorporate specific elements, demonstrating a novel method of modifying large language models. The research aims to enhance AI safety by precisely adjusting model behaviors related to potential risks.